PSM: A New Re-Ranking Algorithm for Named-Page

نویسندگان

Jiafeng Guo

Lin Ding

Gang Zhang

Yue Liu

Xueqi Cheng

چکیده

This year, the IR group of ICT participated in the terabyte track named-page Finding subtask for the first time. Since the document collection is as large as about 426G, our most important goal is to find an efficient way to catch the target web page in such a huge size data set. Meanwhile we want to make the indexing and retrieval processing at a reasonable low cost, both on hardware and time-consuming. We used our “FirteX” engine for indexing and retrieval of this task. The indexing time is within 15 hours and the retrieval time is short enough(less than 2 seconds per query). The main contribution of our work is that we design a Pattern Similarity Matching(PSM) re-ranking algorithm to reorder the results and rank the target document as top 1 as possible. We were glad to see that we’ve got an exciting performance on the last year’s (2005) topics during our experiment. The chief procedure of our work can be divided into three parts as below, which are data preprocess, indexing and retrieval, and re-ranking.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Hybrid Method for Web Pages Ranking in Search Engines

There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...

متن کامل

Incremental Web Search: Tracking Changes in the Web

A large amount of new information is posted on the Web every day. Large-scale web search engines often update their index slowly and are unable to present such information in a timely manner. In this thesis, we present our solutions of searching new information from the web by tracking the changes of web documents. First, we present the algorithms and techniques useful for solving the following...

متن کامل

Efficient Algorithm for Mining on Bio Medical Data for Ranking the Web Pages

Information in the internet is evolving in terms of high volume through different sources. Extracting tuples from HTML pages has been an important issue in various web applications such as web data integration, e-commerce market monitoring, and mash ups that repurpose and selectively combine existing web data services. Data Mining is the process of analyzing data from different perspectives and...

متن کامل

Efficient Algorithm for Mining on Bio Medical Data for Ranking the Web Pages

متن کامل

A Frequency Mining-Based Algorithm for Re-ranking Web Search Engine Retrievals

Conventional web search engines retrieve too many documents for the majority of the submitted queries; therefore, they possess a good recall, since there are far more pages than a user can look at. Precision; however, is a critical factor in these conditions, because the most related documents should be presented at the top of the list. In this paper, we propose an online page re-rank model whi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

PSM: A New Re-Ranking Algorithm for Named-Page

نویسندگان

چکیده

منابع مشابه

A New Hybrid Method for Web Pages Ranking in Search Engines

Incremental Web Search: Tracking Changes in the Web

Efficient Algorithm for Mining on Bio Medical Data for Ranking the Web Pages

Efficient Algorithm for Mining on Bio Medical Data for Ranking the Web Pages

A Frequency Mining-Based Algorithm for Re-ranking Web Search Engine Retrievals

عنوان ژورنال:

اشتراک گذاری